The base layer is added by ggplot(). geom() function adds attributes with different properties onto top of the base layer. The visual property of the objects can be points, lines etc.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, colour = class)) #Change point colour by class
Can we change point size by class?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = class))
## Warning: Using size for a discrete variable is not advised.
#> Warning: Using size for a discrete variable is not advised.
Can we change point transparency by class? Yes, we use the argument
alpha= for this.
This is very helpful for dense point clouds, where you want a reader to
see that there are overlapping data points.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
## Warning: Using alpha for a discrete variable is not advised.
#> Warning: Using alpha for a discrete variable is not advised.
Change point shape by class.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class))
## Warning: The shape palette can deal with a maximum of 6 discrete values because more
## than 6 becomes difficult to discriminate
## ℹ you have requested 7 values. Consider specifying shapes manually if you need
## that many have them.
## Warning: Removed 62 rows containing missing values or values outside the scale range
## (`geom_point()`).
#> Warning: Using the shape palette for more than 6 discrete variables is not advised
#> as it becomes difficult to discriminate.
We can set properties manually as well, using a number or a colour.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
#all the points are blue in colour now
There are several aesthetics that can be set up manually - colour of point, size of point, shape of point etc
This allows you to develop separate plots for a
range of reasons, most often to show a subset of your
data. To subset/facet your plot using a single variable, you use
facet_wrap().
~ dictates which variable you want to subset your data
with
#Note: only use facet_wrap() for discrete variables.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2) #making separate plots as a function of 'class'
If you want to use a combination of two variables to facet plots, use facet_grid(); separating the variables by a ~.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
What if you do not want to facet in the rows or columns dimension? We use a .~ before the faceting variable. This argument just places the plots next to one another.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~cyl)
In ggplot2 we can use a variety of visual objects to represent our data points. 1. Display data as points
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Quick recap: A geom is an
object that your plot uses to represent the data.
As a budding scientist, it important to note what geom functions worked
and what didn’t, so use comments to better understand your mapping.
ggplot(data = mpg) +
# geom_point(mapping = aes(x = displ, y = hwy)) # points horrible
# geom_smooth(mapping = aes(x = displ, y = hwy, colour = class)) # try smooth line, doesn't look legible
geom_smooth(mapping = aes(x = displ, y = hwy, colour = drv, linetype = drv)) # based on drv variable, changing the type of line
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
How does grouping objects by categorical variables work in ggplot2?
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, group = drv))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#The argument 'group=' is used for this
ggplot(data = mpg) +
geom_smooth(
mapping = aes(x = displ, y = hwy, color = drv),
show.legend = FALSE,
) #Change the colour of each line based on drv value
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Let’s try combining both points and lines on the same graph
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, colour = drv)) +
#first, put a layer of the points
geom_smooth(mapping = aes(x = displ, y = hwy, colour = drv, linetype = drv))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#second, add a layer of lines on top
#Efficiently, we can reduce the code chunk to this
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Adding new features to the plot
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) + #use mappings to display different aesthetics
geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#specify different data for each layer, here we subset the data for plotting a line
Simple bar plot to start our learning
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
Though we have just provided R with the x variable, it is outputting a plot. How is R calculating this? It is “binning” our data and then plotting our bin counts.
You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using stat_count() instead of geom_bar().
ggplot(data = diamonds) +
stat_count(mapping = aes(x = cut))
#1. If I try adding y variable as freq, R will pop an error
# ggplot(data = diamonds) +
# stat_count(mapping = aes(x = cut, y = frequency(x)))
#2. If I try doing the same this with geom_bar(), R will still pop an error? Why?
# ggplot(data = diamonds) +
# geom_bar(mapping = aes(x = cut, y = freqquency(x)))
#This is because in both the cases, the parameters allows for either x or y aesthetic to be specified.
Can we override a default stat (which is a count or a summary) to identity (which is the raw value of a variable)? Yes.
#Example
demo <- tribble(
~cut, ~freq,
"Fair", 1610,
"Good", 4906,
"Very Good", 12082,
"Premium", 13791,
"Ideal", 21551
)
demo
## # A tibble: 5 × 2
## cut freq
## <chr> <dbl>
## 1 Fair 1610
## 2 Good 4906
## 3 Very Good 12082
## 4 Premium 13791
## 5 Ideal 21551
ggplot(data = demo) +
geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")
#You can also override a default mapping from transformed variables to aesthetics
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = stat(prop), group = 1))
## Warning: `stat(prop)` was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(prop)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
#> Warning: `stat(prop)` was deprecated in ggplot2 3.4.0.
Can we plot statistical details using ggplot2?
ggplot(data = diamonds) +
stat_summary(
mapping = aes(x = cut, y = depth),
fun.min = min,
fun.max = max,
fun = median
) #fun is the function that is calculated by R, here it is a median.
Three scenarios trying different aspects of aesthetics in ggplot2
#colour -> gives border colours for each of the bars in the plot
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, colour = cut))
#fill -> adds colour to the bars
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))
#Here, the fill argument is assigned to another variable in the 'diamonds' dataset called 'clarity'
#Notice how the stacking is done automatically. This is done behind the scenes with a position argument.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity)) #Looks very messy
#Try altering transparency of bars in bar plot
ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
geom_bar(alpha = 1/5, position = "identity")
#To color the bar outlines with no fill color
ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) +
geom_bar(fill = NA, position = "identity")
#Let's make stacked bar plots
#position = "fill" works like stacking, but makes each set of stacked bars the same height.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")
#position = "dodge" places overlapping objects directly beside one another.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
#Using jitter for scatterplots
#position = "jitter" adds a small amount of random noise to each point to avoid overplotting when points overlap. This is useful for scatterplots but not barplots.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")
ggplot(data = DATA) +
GEOM_FUNCTION(
mapping = aes(MAPPINGS),
stat = STAT,
position = POSITION
) +
FACET_FUNCTION
Using the labs() function
#Scenario 1: title, subtitle, caption
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(
title = "Fuel efficiency generally decreases with engine size",
subtitle = "Two seaters (sports cars) are an exception because of their light weight",
caption = "Data from fueleconomy.gov"
)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#Scenario 2: axes labels and legend titles
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_smooth(se = FALSE) +
labs(
x = "Engine displacement (L)",
y = "Highway fuel economy (mpg)",
colour = "Car type"
)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
What if you want to add text to the plot directly? Here we use geom_text() to add textual labels to our plots. This works similar to geom_point() but rather than a shape geometry it adds a label.
best_in_class <- mpg %>%
group_by(class) %>%
filter(row_number(desc(hwy)) == 1)
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_text(aes(label = model), data = best_in_class)
#To avoid the labels from overlapping, we can use the argument nudge() within the function geom_text()
We can change the default scales by tweaking the values in the scale parameters.
Observe how the readability of the graphs change with tweaks in the scale of x-axis.
#Scenario 1
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_continuous() +
scale_y_continuous() +
scale_colour_discrete()
#Scenario 2
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_continuous(limits = c(0, 15)) +
scale_y_continuous() +
scale_colour_discrete()
#Scenario 3
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_continuous(limits = c(0, 10)) +
scale_y_continuous() +
scale_colour_discrete()
#Scenario 4
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_continuous(limits = c(0, 8)) +
scale_y_continuous() +
scale_colour_discrete()
What if you want to change the ticks on the axes?
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_y_continuous(breaks = seq(15, 40, by = 5))
#Notice how the y-axis has breaks in multiples of 5, from 15 to 40.
#The funciton seq() outputs a sequence of number with the difference of a specified count.
base <- ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class))
base + theme(legend.position = "left")
base + theme(legend.position = "top")
base + theme(legend.position = "bottom")
base + theme(legend.position = "right") # the default
#To suppress the display of the legend altogether use
# legend.position = 'none'
How can you change the colour scales?
#Use colour palettes available on R
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = drv, shape = drv)) +
scale_colour_brewer(palette = "Set1")
#Setting colours manually
presidential %>%
mutate(id = 33 + row_number()) %>%
ggplot(aes(start, id, colour = party)) +
geom_point() +
geom_segment(aes(xend = end, yend = id)) +
scale_colour_manual(values = c(Republican = "red", Democratic = "blue"))
You can customise the entire theme of your plot.
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
theme_bw()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
theme_light()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
theme_classic()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
theme_dark()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## You can also set all the arguments for theme() yourself.